AITopics | bayesian robust optimization

Bayesian Robust Optimization for Imitation Learning

Neural Information Processing SystemsDec-23-2025, 19:43:29 GMT

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.

bayesian robust optimization, imitation learning, reward function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Supplementary Materials for Bayesian Robust Optimization for Imitation Learning Daniel S. Brown

Neural Information Processing SystemsOct-2-2025, 07:42:55 GMT

When using the robust performance metric described in Section 4.2, we have We solve the above linear program to obtain the results presented in Section 5.1. Work done while at UT Austin. We use Scipy's linear programming software (v 1.4.1) MDP is solved to obtain the sample's likelihood and determine the transition probabilities within the Markov chain. We used a learning rate of 0.01.

Add feedback

Review for NeurIPS paper: Bayesian Robust Optimization for Imitation Learning

Neural Information Processing SystemsJan-22-2025, 05:18:32 GMT

Clarity: Overall, I think the paper is fairly well written. I understand that the authors are working within the page restrictions of the conference. With that said, I think there is substantial room for improvement in the paper presentation. First, I think there are more specific ways to describe the contributions (copied from summary): 1) a linear programming formulation to compute the optimal policy for CVaR; 2) show how to use this to implement robust policy optimization under a prior and robust imitation learning; 3) demonstrate favorable comparisons with existing risk-sensitive and risk neutral algorithms for both settings. Right now I think that the description of the contributions hides the most useful contribution.

bayesian robust optimization, imitation learning, notation, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.38)

Add feedback

Review for NeurIPS paper: Bayesian Robust Optimization for Imitation Learning

Neural Information Processing SystemsJan-22-2025, 05:18:24 GMT

This is a nice paper on robust/safe imitation learning. Could be better to link with Bayesian safe exploration for RL. After further discussion, all reviewers agree that it should be accepted.

bayesian robust optimization, imitation learning, neurips paper

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

Bayesian Robust Optimization for Imitation Learning

Neural Information Processing SystemsOct-9-2024, 16:07:13 GMT

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL).

bayesian robust optimization, imitation learning, reward function, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Filters

Collaborating Authors

bayesian robust optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Bayesian Robust Optimization for Imitation Learning

Supplementary Materials for Bayesian Robust Optimization for Imitation Learning Daniel S. Brown

Review for NeurIPS paper: Bayesian Robust Optimization for Imitation Learning

Review for NeurIPS paper: Bayesian Robust Optimization for Imitation Learning

Bayesian Robust Optimization for Imitation Learning